Supplementary Material about distribution modelling of S. alveolata occurrence around Ireland
Supplementary Material about distribution modelling of S. alveolata occurrence around Ireland
- 1 Summary - Distribution Modelling of Sabellaria alveolata reefs around Ireland
- 2 Packages
- 3 Import and Prepare data
- 4 Comparison of alternative models (GLM, GAM, RF, BRT, MARS, SVM) using default settings and the ‘sdm’ package
- 5 Main Results from Boosted Regression Tree using Spatial blocks for cross-validation
- 5.1 Spatial autocorrelation (SAC) and autoccorelation range in predictors
- 5.2 Generate Spatial Blocks based on environmental autocorrelation range
- 5.3 Fit BRT using spatial blocks for cross-validation
- 5.4 Performance
- 5.5 Initial and residual SAC
- 5.6 Importance of the variables and shape of the responses
- 5.7 Fitted functions with bootstrap for confidence interval
- 5.8 Interactions between variables
- 5.9 Spatial predictions
- 6 Session info
1 Summary - Distribution Modelling of Sabellaria alveolata reefs around Ireland
This document provides Supplementary Material to the paper Specific niche requirements of an intertidal ecosystem engineer promote multidecadal stability of its geographical range in the face of climate warming by Firth et al. It details the different steps of the distribution modelling, and includes complementary information to the main paper. The document is organised in sections as follows:
Introduction (this section)
Importing R libraries required for analyses
Data Preparation and Variable Selection. This section includes information about environmental variables resolution and pre-treatment prior to statistical modelling.
Preliminary exploring of results sensitivity to alternative modelling approaches. Comparison of alternative modelling approaches (namely GLM, GAM, RF, BRT, MARS, SVM applied with default settings using the SDM package), and Boosted Regression Tree tuned according to Elith et al. (2008) and applied using the ggBRT package. Note that this section justifies our choice to focus on results from Boosted Regression Trees (BRT) as it yielded better predictive capacities and consistent results relative to other approaches fitted with default settings using the ‘sdm’ package. [Elith, J., Leathwick, J.R. and Hastie, T. (2008), A working guide to boosted regression trees. Journal of Animal Ecology, 77: 802-813. https://doi.org/10.1111/j.1365-2656.2008.01390.x ]
Presentation of main results from Boosted Regression Tree Presence/Absence Modelling. This section expands on the results presented in the main paper. Note that this section ends with a specific subsection dedicated to spatial autocorrelation in the initial dataset and in model residuals.
This final section details the list of R packages and R session information used to produce this R markdown document.
2 Packages
2.1 Data handling
library(dplyr)
library(purrr)
library(magrittr)
require(readxl) # for the excel_sheets() function
2.2 Analysis
# Colinearity
library(usdm)
# SDM
library(sdm)
# Brt
library(ggBRT)
library(gbm)
# Random Forest
library(randomForest)
# AUC, Thresholding etc...
library(ROCR)
library(PresenceAbsence)
# CV and autocorrelation
library(blockCV)
library(automap)
library(ncf)
library(sp)
library(sf)
2.3 Graphics
library(ggplot2)
library(ggcorrplot)
library(ggridges)
library(gridExtra)
library(viridis)
library(scales)
library(tidyverse)
library(maps)
library(mapdata)
library(fields)
library(rgdal)
library(RColorBrewer)
library(ggpointdensity)
3 Import and Prepare data
Presence / Absence data correspond to all available field surveys post-2013 as presented in the paper. Environmental data were derived from available datasets as described below:
Substrate Substrate information; which was extracted from EMODNET seabed habitat, corresponds to the Folk sediment triangle and the hierarchy of Folk classification (EMODNET data). Factors are coded as follows [EMODNET levels 6-9 were removed]: 2 Sand 3 Coarse substrate 4 Mixed sediment 5 Rock & boulders 11 Mud 12 Sandy Mud 13 Muddy Sand
Fetch Wave fetch estimates, i.e. local coastline exposure to ocean waves, are after Burrows et al. (2008). Original data horizontal resolution is 100 m.
Burrows et al. (2008). “Wave exposure indices from digital coastlines and the prediction of rocky shore community structure.” Marine Ecology Progress Series, 353, 1-12.Tidal Amplitude was extracted from OTIS-OSU (https://www.tpxo.net/home). Tides were modelled based on Egbert and Erofeeva (2002) - “Efficient Inverse Modeling of Barotropic Ocean Tides.” Journal of Atmospheric and Oceanic Technology, 19(2), 183–204 - and Egbert et al. (2010) - “Assimilation of altimetry data for nonlinear shallow-water tides: Quarter-diurnal tides of the Northwest European Shelf.” Continental Shelf Research, 30(6), 668–679. Horizontal resolution is ~10 km.
Ocean front metrics Ocean front metrics correspond to these described in Miller et al. (2015). Climatological statistics were derived from daily estimates over the 2013-2018 period. Horizontal resolution of the original data is ~2.75km.
Miller et al. (2015). “Basking sharks and oceanographic fronts: quantifying associations in the north-east Atlantic.” Functional Ecology, 29, 1099-1109.Air temperature and Wind Speed Air temperature and Wind Speed are from the Météo France ARPEGE model as described in Déqué et al. (1994). Horizontal resolution of the original data is ~7.4km in longitude et ~11km in latitude. Climatological statistics were derived from hourly estimates over the 2012-2018 period. Déqué et al. (1994). “The ARPEGE/IFS atmosphere model: a contribution to the French community climate modelling.” Climate Dynamics, 10, 249-266.
Salinity and Sea Surface Temperature (SST) were from Amo et al. (2019). Horizontal resolution of the original data is ~3km. Climatological statistics were derived from monthly mean estimates over the 2013-2018 period. Amo et al. (2019). “Product user manual - Atlantic - Iberian Biscay Irish - Ocean Physics Analysis and Forecast Product: IBI_ANALYSIS_FORECAST_PHYS_005_001”. Available from Copernicus Marine Environment Monitoring Service.
Stratification Index was derived by combining EMODNET bathymetry with vertically averaged current velocity from Amo et al. (2019). Data resolution is as described above for salinity and SST.
Amo et al. (2019). “Product user manual - Atlantic - Iberian Biscay Irish - Ocean Physics Analysis and Forecast Product: IBI_ANALYSIS_FORECAST_PHYS_005_001”. Available from Copernicus Marine Environment Monitoring Service.Significant Wave Height was estimated as described in Bricheno and Wolf (2018). Data horizontal resolution is 0.083°. Climatological statistics were produced as in Briceno and Wolf (2018).
Bricheno and Wolf (2018). “Future wave conditions of Europe, in response to high‐end climate change scenarios.” Journal of Geophysical Research: Oceans 123, 8762-8791.
# Define spatial resolution, input and output directories and file names
DataRes <- c('5KM', NA)[1]
input_dir <- 'InputData'
output_dir <- paste0('./Outputs', DataRes)
sdm_occ_file <- "sdm_occ_n10_test30_thin_RandomCrossVal.Rdata"
# occurrence of Sabellaria & environmental data
sab_env <- read.csv(file.path(input_dir,paste0("sab_env+sub_",DataRes,"data_IE.csv")))
rm_row <- nrow(sab_env)
sab_env <- sab_env[!is.na(sab_env$Substrate),]
sab_env <- droplevels(sab_env)
rm_row <- rm_row - nrow(sab_env)
xtabs(~sab_env$Substrate)
## sab_env$Substrate
## 2 3 4 5 11 12 13
## 72 66 4 74 6 14 112
sab_env$Substrate <- as.factor(sab_env$Substrate)
# levels(sab_env$Substrate) <- c('Sand','Coarse substrate','Mixed sediment','Rock & boulders','Mud','Sandy Mud','Muddy sand')
names(sab_env['Substrate']) <- "Substrate" # "Substrate_type"
print(paste('Removed', rm_row, 'data points due to lack of substrate information around Connemara coastline'))
## [1] "Removed 5 data points due to lack of substrate information around Connemara coastline"
3.1 Remove colinear environmental variables
# Remove colinear variables
coord_bin <- sab_env %>%
dplyr::select(Longitude_5km, Latitude_5km )
# Initial correlation
sab_env %>%
dplyr::select(-Sab.ID., -Location.name, -Surveyor.code, - County.number, -County, -Administrative.region, -Latitude, -Longitude, -Co.ordinate.accuracy, -Day, -Month, -Year, - Intertidal.Subtidal, -starts_with("SACFOR"), -Substrate, -FrontGradDens_min, -FrontSide_min, -FrontSide_max, -FrontPersist_min) %>%
cor(., method = "spearman") %>%
ggcorrplot(., hc.order = TRUE, type = "upper", lab = FALSE, method = "circle", colors = c("tomato2", "white","springgreen3"))
Figure 3.1:
sab_env <- sab_env %>%
dplyr::select(-FrontPersist_min, -FrontPersist_p10, -FrontSide_min, -FrontSide_p10, -FrontGradDens_min, -FrontGradDens_p10,-FrontPersist_p90, -FrontGradDens_p90, -FrontDist_p90, -FrontGradDens_p05, -FrontPersist_p05, -FrontSide_max, -FrontSide_p95, -FrontSide_p90, -FrontSide_p05, -FrontSide_p50, -FrontSide_sd, -FrontDist_max, -FrontDist_p95, -FrontPersist_max, -FrontPersist_p95, -FrontGradDens_p95, -FrontPersist_p95, -FrontPersist_p50, -FrontGradDens_p50, -FrontGradDens_sd, -FrontGradDens_max, -FrontDist_min, -FrontDist_p10, -FrontDist_p50, -FrontGradDens_p90, -FrontPersist_sd, -contains("_p50"),-WaveHeight_p90, -WaveHeight_p99, -WaveHeight_p10, -WaveHeight_p90, -WaveHeight_max, -WaveHeight_p95, -AirTemp_p90, -AirTemp_max, -AirTemp_p10, -SST_p90, -SST_p10, -SST_max, -FrontGradDens_mean, -FrontDist_sd, -MixedLayerDepth, -CloudCover, -AirTemp_mean,-AirTemp_min,-SurfCurrent, -Aspect,-SpringChlA,-Longitude_5km, -Latitude_5km )
# Check remaining correlation
sab_env %>%
dplyr::select(-Sab.ID., -Location.name, -Surveyor.code, - County.number, -County, -Administrative.region, -Latitude, -Longitude, -Co.ordinate.accuracy, -Day, -Month, -Year, - Intertidal.Subtidal, -starts_with("SACFOR"), -Substrate) %>%
cor(., method = "spearman") %>%
ggcorrplot(., hc.order = TRUE, type = "upper", lab = FALSE, method = "circle", colors = c("tomato2", "white","springgreen3"))
Figure 3.2:
# Check remaining collinearity
sab_env %>%
dplyr::select(-Sab.ID., -Location.name, -Surveyor.code, - County.number, -County, -Administrative.region, -Latitude, -Longitude, -Co.ordinate.accuracy, -Day, -Month, -Year, - Intertidal.Subtidal, -starts_with("SACFOR")) %>%
vif(.) %>%
arrange(desc(VIF))
## Variables VIF
## 1 Fetch 18.866666
## 2 FrontPersist_mean 9.238758
## 3 WindSpeed 8.507063
## 4 Salinity 7.514083
## 5 WaveHeight_mean 7.254731
## 6 FrontDist_mean 6.725121
## 7 TideAmp 5.645464
## 8 SST_mean 4.024994
## 9 FrontDist_p05 3.771220
## 10 SST_min 2.973670
## 11 FrontSide_mean 2.955355
## 12 StratifIndex 2.692326
## 13 Substrate NA
3.1 Note
Similar results are obtained if we take
WaveHeight_p95orWaveHeight_meanSimilar results are also obtained if we takeStratifIndexorSurfCurrentasStratifIndexwas calculated according to Crips 1989 as \(SI=log_{10}(\frac{H}{u^3})\) where \(H\) = water depth and \(u\) = surface current velocity
- Remove fetch
# Check remaining correlation
sab_env %>%
dplyr::select(-Sab.ID., -Location.name, -Surveyor.code, - County.number, -County, -Administrative.region, -Latitude, -Longitude, -Co.ordinate.accuracy, -Day, -Month, -Year, - Intertidal.Subtidal, -starts_with("SACFOR"), -Fetch, -Substrate) %>%
cor(., method = "spearman") %>%
ggcorrplot(., hc.order = TRUE, type = "upper", lab = FALSE, method = "circle", colors = c("tomato2", "white","springgreen3"))
Figure 3.3:
# Check remaining collinearity
sab_env %>%
dplyr::select(-Sab.ID., -Location.name, -Surveyor.code, - County.number, -County, -Administrative.region, -Latitude, -Longitude, -Co.ordinate.accuracy, -Day, -Month, -Year, - Intertidal.Subtidal, -starts_with("SACFOR"), -Fetch) %>%
vif(.) %>%
arrange(desc(VIF))
## Warning in model.response(mf, "numeric"): using type = "numeric" with a
## factor response will be ignored
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors
## Warning in Ops.factor(r, 2): '^' not meaningful for factors
## Variables VIF
## 1 WindSpeed 8.057407
## 2 WaveHeight_mean 7.142121
## 3 Salinity 7.111227
## 4 TideAmp 5.551628
## 5 FrontDist_mean 5.348088
## 6 FrontPersist_mean 4.970886
## 7 FrontDist_p05 3.711769
## 8 SST_mean 3.676061
## 9 FrontSide_mean 2.932645
## 10 SST_min 2.805090
## 11 StratifIndex 2.691842
## 12 Substrate NA
3.2 Format dataset with final subset of environmental predictors
# Compute the occurrences from the SACFOR data
sab_env <- sab_env %>%
dplyr::mutate(occurrence = if_else(SACFOR.numerical > 0, 1, 0))
# Select the predictors
predictors <- sab_env %>%
# Select only the explicative variables
dplyr::select(-Sab.ID., -Location.name, -Surveyor.code, - County.number, -County, -Administrative.region, -Latitude, -Longitude, -Co.ordinate.accuracy, -Day, -Month, -Year, - Intertidal.Subtidal, -starts_with("SACFOR"), -Fetch,-occurrence) %>%
# Extract the names of the variables
colnames(.)
3.3 Thin absence / presence data
par(mfrow = c(1,2))
# Plot occurrence data
plot(sab_env$Longitude,sab_env$Latitude, pch= 19,cex=1.2,col='darkgreen', axes = F, xlab = 'Longitude', ylab = 'Latitude', type = 'n', main = 'All occurrence data')
maps::map('world', add=T, fill=T, col='beige', bg='steelblue'); axis(1); axis(2)
points(sab_env[which(sab_env$occurrence==1),]$Longitude,sab_env[which(sab_env$occurrence==1),]$Latitude, pch= 19,cex=1.2,col='darkgreen')
points(sab_env[which(sab_env$occurrence==0),]$Longitude,sab_env[which(sab_env$occurrence==0),]$Latitude, pch= 4,cex=1.2,col='blue' )
legend('topleft', legend = c('Pres.','Abs.'), pch = c(19,4), col = c('darkgreen','blue'))
# Define Spatial Bins Based on Raster
if(FALSE){load(file.path(input_dir,'IE_EnvRasterStack_2.75km.RData'))
sab_env$Lon_bin <- cut(sab_env$Longitude, seq(xmin(raStack), xmax(raStack),res(raStack)[1] ) )
sab_env$Lat_bin <- cut(sab_env$Latitude , seq(ymin(raStack), ymax(raStack),res(raStack)[2] ) )
}else{
sab_env$Lon_bin <- as.factor(as.character(coord_bin$Longitude_5km))
sab_env$Lat_bin <- as.factor(as.character(coord_bin$Latitude_5km))
}
temp <- aggregate(sab_env$occurrence, FUN = mean, by = list(Lon_bin=sab_env$Lon_bin, Lat_bin=sab_env$Lat_bin))
temp$x[temp$x > 0] = 1
summary(temp)
## Lon_bin Lat_bin x
## -10.115: 5 53.235 : 13 Min. :0.0000
## -6.063 : 5 53.135 : 8 1st Qu.:0.0000
## -8.463 : 5 55.185 : 8 Median :0.0000
## -8.963 : 5 55.235 : 7 Mean :0.4205
## -9.863 : 5 52.135 : 6 3rd Qu.:1.0000
## -10.065: 4 52.235 : 6 Max. :1.0000
## (Other):147 (Other):128
temp <- merge(x = sab_env, y = temp)
temp$dup <- duplicated(temp[,c('Lon_bin','Lat_bin')])
sab_env_thin <- temp[!temp$dup,!names(temp) %in% c('dup','Lon_bin','Lat_bin','occurrence')]
names(sab_env_thin)[ncol(sab_env_thin)] <- 'occurrence'
print(paste(nrow(sab_env) - nrow(sab_env_thin),'occurence records removed through thining.'))
## [1] "172 occurence records removed through thining."
plot(sab_env_thin$Longitude,sab_env_thin$Latitude, pch= 19,cex=1.2,col='darkgreen', axes = F, xlab = 'Longitude', ylab = 'Latitude', type = 'n', main = 'Thinned occurrence data' )
maps::map('world', add=T, fill=T, col='beige', bg='steelblue'); axis(1); axis(2)
points(sab_env_thin[which(sab_env_thin$occurrence==1),]$Longitude,sab_env_thin[which(sab_env_thin$occurrence==1),]$Latitude, pch= 19,cex=1.2,col='darkgreen')
points(sab_env_thin[which(sab_env_thin$occurrence==0),]$Longitude,sab_env_thin[which(sab_env_thin$occurrence==0),]$Latitude, pch= 4,cex=1.2,col='blue' )
points(temp[which(temp$dup),]$Longitude,temp[which(temp$dup),]$Latitude, pch= 3,cex= .6,col='red' )
legend('topleft', legend = c('Pres.','Abs.','Thining'), pch = c(19,4,3), col = c('darkgreen','blue','red'))
Figure 3.4:
# Summary of the dataset after thining
summary(sab_env_thin)
## Sab.ID. Location.name Surveyor.code County.number
## 1-06 : 1 Doonbeg : 2 Min. :1.000 Min. : 1.00
## 10-05 : 1 Glasdrumman : 2 1st Qu.:1.000 1st Qu.: 8.00
## 10-10 : 1 Inch : 2 Median :1.000 Median :11.00
## 10-102 : 1 Abbey Island: 1 Mean :1.222 Mean :11.18
## 10-105 : 1 Annalong : 1 3rd Qu.:1.000 3rd Qu.:15.25
## 10-110 : 1 Ardfry Ho : 1 Max. :2.000 Max. :18.00
## (Other):170 (Other) :167
## County Administrative.region Latitude
## Donegal:32 Northern Ireland : 30 Min. :51.48
## Galway :28 Republic of Ireland:146 1st Qu.:52.31
## Cork :21 Median :53.40
## Down :18 Mean :53.53
## Kerry :13 3rd Qu.:54.58
## Antrim :11 Max. :55.29
## (Other):53
## Longitude Co.ordinate.accuracy Day Month
## Min. :-10.394 < 10m :114 Min. : 1.00 Min. : 1.000
## 1st Qu.: -9.241 < 1km : 21 1st Qu.: 4.00 1st Qu.: 2.000
## Median : -8.354 < 100m : 17 Median :12.00 Median : 5.000
## Mean : -8.041 <100m : 17 Mean :12.52 Mean : 4.768
## 3rd Qu.: -6.644 <10m : 4 3rd Qu.:19.00 3rd Qu.: 7.000
## Max. : -5.439 < 100m : 1 Max. :31.00 Max. :11.000
## (Other): 2 NA's :27 NA's :25
## Year Intertidal.Subtidal SACFOR.aplhabetical SACFOR.numerical
## Min. :2013 Intertidal:176 N :108 Min. :0.000
## 1st Qu.:2013 C : 28 1st Qu.:0.000
## Median :2014 A : 12 Median :0.000
## Mean :2015 P : 12 Mean :1.568
## 3rd Qu.:2016 F : 9 3rd Qu.:4.000
## Max. :2018 R : 5 Max. :6.000
## (Other): 2
## Fetch Salinity SST_mean SST_min
## Min. : 1928 Min. :28.43 Min. :10.56 Min. :4.086
## 1st Qu.:10988 1st Qu.:33.02 1st Qu.:11.29 1st Qu.:5.902
## Median :14976 Median :33.99 Median :11.59 Median :6.154
## Mean :14724 Mean :33.56 Mean :11.53 Mean :6.178
## 3rd Qu.:19204 3rd Qu.:34.58 3rd Qu.:11.82 3rd Qu.:6.526
## Max. :26011 Max. :35.05 Max. :12.10 Max. :7.619
##
## StratifIndex WindSpeed TideAmp WaveHeight_mean
## Min. :1.849 Min. : 9.256 Min. :0.4032 Min. :0.542
## 1st Qu.:3.630 1st Qu.:11.251 1st Qu.:1.6750 1st Qu.:1.085
## Median :4.086 Median :11.732 Median :1.9313 Median :1.298
## Mean :4.010 Mean :11.770 Mean :1.9088 Mean :1.420
## 3rd Qu.:4.435 3rd Qu.:12.383 3rd Qu.:2.2560 3rd Qu.:1.898
## Max. :6.024 Max. :13.744 Max. :3.7545 Max. :2.698
##
## FrontDist_mean FrontDist_p05 FrontPersist_mean FrontSide_mean
## Min. : 4.678 Min. :1.175 Min. :3.390e-06 Min. :-0.7076
## 1st Qu.: 7.382 1st Qu.:2.140 1st Qu.:6.620e-03 1st Qu.:-0.3791
## Median :10.592 Median :2.686 Median :1.245e-02 Median :-0.2839
## Mean :10.678 Mean :2.958 Mean :1.391e-02 Mean :-0.2738
## 3rd Qu.:13.568 3rd Qu.:3.571 3rd Qu.:2.118e-02 3rd Qu.:-0.1748
## Max. :18.564 Max. :6.634 Max. :3.615e-02 Max. : 0.1534
##
## Substrate occurrence
## 2 :48 Min. :0.0000
## 3 :49 1st Qu.:0.0000
## 4 : 4 Median :0.0000
## 5 :34 Mean :0.4205
## 11: 3 3rd Qu.:1.0000
## 12: 8 Max. :1.0000
## 13:30
# Number of presence and background records before thining
print(paste("Before thining we had", table(sab_env$occurrence)[1], "absences and", table(sab_env$occurrence)[2], "occurences"))
## [1] "Before thining we had 174 absences and 174 occurences"
# Number of presence and background records after thining
print(paste("After thining we have", table(sab_env_thin$occurrence)[1], "absences and", table(sab_env_thin$occurrence)[2], "occurences left"))
## [1] "After thining we have 102 absences and 74 occurences left"
# From now on, we will use the thinned data
sab_env <- sab_env_thin
4 Comparison of alternative models (GLM, GAM, RF, BRT, MARS, SVM) using default settings and the ‘sdm’ package
4.1 Performance of different models
set.seed(64)
sab_sdm_data <- sdm::sdmData(occurrence~., train=sab_env %>% select(occurrence, !!predictors))
sdm_occ <- sdm(occurrence~.,data=sab_sdm_data, methods=c('glm','gam','gbm','rf','svm', "mars"),test.percent=30, n=10, replication = 'cv', ncores = 2)
save(sdm_occ, file = file.path(output_dir, sdm_occ_file))
load(file.path(output_dir,sdm_occ_file))
# Summary of the sdm
sdm_occ
## class : sdmModels
## ========================================================
## number of species : 1
## number of modelling methods : 6
## names of modelling methods : glm, gam, brt, rf, svm, mars
## replicate.methods (data partitioning) : cross_validation
## number of replicates (each method) : 10
## toral number of replicates per model : 50 (per species)
## ------------------------------------------
## model run success percentage (per species) :
## ------------------------------------------
## method occurrence
## ----------------------
## glm : 100 %
## gam : 100 %
## brt : 100 %
## rf : 100 %
## svm : 100 %
## mars : 100 %
##
## ###################################################################
## model Mean performance (per species), using test dataset (generated using partitioning):
## -------------------------------------------------------------------------------
##
## ## species : occurrence
## =========================
##
## methods : AUC | COR | TSS | Deviance
## -------------------------------------------------------------------------
## glm : 0.84 | 0.62 | 0.68 | 1.88
## gam : 0.81 | 0.58 | 0.63 | 12.04
## brt : 0.88 | 0.67 | 0.74 | 1.06
## rf : 0.91 | 0.73 | 0.75 | 0.74
## svm : 0.91 | 0.73 | 0.78 | 0.91
## mars : 0.82 | 0.59 | 0.64 | 8.53
# ROC
roc(sdm_occ)
Figure 4.1:
4.2 Importance of variables for alternative methods using SDM package
Variable importance (Naimi & Araujo 2016 ECOGRAPHY) :
- A method is to calculate the improvement of the model performance over inclusion of each variable comparing to when the variable is excluded through a cross‐validation procedure.
- Another method is a randomization procedure that measures the correlation between the predicted values and predictions where the variable under investigation is randomly permutated. If the contribution of a variable to the model is high, then it is expected that the prediction is more affected by a permutation and therefore the correlation is lower. Therefore, ‘1 – correlation’ can be considered as a measure of variable importance
# Variables importance
#---------------------
# Get the info on each runs
runs <- attributes(sdm_occ)$run.info
# Get the contribution of the variables of the different runs
var_contrib <- as.list(1:nrow(runs)) %>%
# Retrieve the importance of the variables for each run
purrr::map(.,~getVarImp(sdm_occ,id=.,wtest='test.dep')) %>%
# Format this output do get a df
purrr::map(., ~ .x %>% attributes(.) %>% extract2("varImportance")) %>%
# Bind the details on the corresponding run
purrr::map2_dfr(., as.list(1:nrow(runs)), ~ as.data.frame(.x) %>% mutate(method = runs[.y, "method"],replicationID = runs[.y, "replicationID"]))
contrib <- var_contrib %>%
group_by(method, variables) %>%
summarise_at(vars("corTest","AUCtest"), funs('mean' = mean(.), 'sd' = sd(.), 'se' = sqrt(var(., na.rm = T)/length(.)))) %>%
ungroup()
# AUCtest
#--------
var_order <- contrib %>%
group_by(variables) %>%
summarise(AUCtest_mean = mean(AUCtest_mean)) %>%
arrange(desc(AUCtest_mean)) %>%
pull(variables)
contrib <- contrib %>%
mutate(variables = factor(variables, levels = var_order, ordered = TRUE))
p_AUC <- ggplot(contrib, aes(x= variables, y = AUCtest_mean, col = method, dodge = method)) +
geom_pointrange(aes(ymin = AUCtest_mean - AUCtest_sd, ymax = AUCtest_mean + AUCtest_sd), position = position_dodge(width = 0.4)) +
xlab("") + ylab("Mean (± sd) variable contribution (AUC-based)") +
scale_color_manual(values = c("#404040","#bababa","lightcyan3","darkslateblue","orangered4","darkgoldenrod3"), "Method") +
theme(axis.text.x = element_text(face="bold", color="black",
size=12, angle=45, vjust = 1, hjust = 1),
axis.text.y = element_text( color="black",
size=12),
axis.ticks = element_blank())
# CorTest
#--------
var_order <- contrib %>%
group_by(variables) %>%
summarise(corTest_mean = mean(corTest_mean)) %>%
arrange(desc(corTest_mean)) %>%
pull(variables)
contrib <- contrib %>%
mutate(variables = factor(variables, levels = var_order, ordered = TRUE))
p_COR <- ggplot(contrib, aes(x= variables, y = corTest_mean, col = method, dodge = method)) +
geom_pointrange(aes(ymin = corTest_mean - corTest_sd, ymax = corTest_mean + corTest_sd), position = position_dodge(width = 0.4)) +
scale_color_manual(values = c("#404040","#bababa","lightcyan3","darkslateblue","orangered4","darkgoldenrod3"), "Method") +
xlab("") + ylab("Mean (± sd) variable contribution (Correlation-based)") +
theme(axis.text.x = element_text(face="bold", color="black",
size=12, angle=45, vjust = 1, hjust = 1),
axis.text.y = element_text( color="black",
size=12),
axis.ticks = element_blank())
grid.arrange(p_AUC,p_COR, ncol = 1)
(#fig:sdm_varImpSD)
# AUCtest
#--------
var_order <- contrib %>%
group_by(variables) %>%
summarise(AUCtest_mean = mean(AUCtest_mean)) %>%
arrange(desc(AUCtest_mean)) %>%
pull(variables)
contrib <- contrib %>%
mutate(variables = factor(variables, levels = var_order, ordered = TRUE))
p_AUC <- ggplot(contrib, aes(x= variables, y = AUCtest_mean, col = method, dodge = method)) +
geom_pointrange(aes(ymin = AUCtest_mean - AUCtest_se, ymax = AUCtest_mean + AUCtest_se), position = position_dodge(width = 0.4)) +
xlab("") + ylab("Mean (± se) variable contribution (AUC-based)") +
scale_color_manual(values = c("#404040","#bababa","lightcyan3","darkslateblue","orangered4","darkgoldenrod3"), "Method") +
theme(axis.text.x = element_text(face="bold", color="black",
size=12, angle=45, vjust = 1, hjust = 1),
axis.text.y = element_text( color="black",
size=12),
axis.ticks = element_blank())
# CorTest
#--------
var_order <- contrib %>%
group_by(variables) %>%
summarise(corTest_mean = mean(corTest_mean)) %>%
arrange(desc(corTest_mean)) %>%
pull(variables)
contrib <- contrib %>%
mutate(variables = factor(variables, levels = var_order, ordered = TRUE))
p_COR <- ggplot(contrib, aes(x= variables, y = corTest_mean, col = method, dodge = method)) +
geom_pointrange(aes(ymin = corTest_mean - corTest_se, ymax = corTest_mean + corTest_se), position = position_dodge(width = 0.4)) +
scale_color_manual(values = c("#404040","#bababa","lightcyan3","darkslateblue","orangered4","darkgoldenrod3"), "Method") +
xlab("") + ylab("Mean (± se) variable contribution (Correlation-based)") +
theme(axis.text.x = element_text(face="bold", color="black",
size=12, angle=45, vjust = 1, hjust = 1),
axis.text.y = element_text( color="black",
size=12),
axis.ticks = element_blank())
grid.arrange(p_AUC,p_COR, ncol = 1)
(#fig:sdm_varImp_se)
4.3 Optimised settings for BRT models (based on Elith et al. 2008)
set.seed(64)
# Number of CV folds
n_folds <- 10
# Tree complexity
tc <- 5
# Learning rate
lr <- 0.001
# bag.fraction
bf <- 0.7
4.4 Peformance of BRT models on random Cross-Validation (for comparison with SDM package results)
# Fit BRT using 10-fold random cross-validation
brt_rand <- gbm.step(data=sab_env, gbm.x = predictors,
gbm.y = which(names(sab_env) == "occurrence"),
family = "bernoulli",
tree.complexity = tc,
learning.rate = lr,
bag.fraction = bf,
prev.stratify = TRUE,
n.folds = n_folds,
keep.fold.models = TRUE,
keep.fold.vector = TRUE,
keep.fold.fit = TRUE,
verbose = FALSE)
##
##
## GBM STEP - version 2.9
##
## Performing cross-validation optimisation of a boosted regression tree model
## for occurrence and using a family of bernoulli
## Using 176 observations and 12 predictors
## creating 10 initial models of 50 trees
##
## folds are stratified by prevalence
## total mean deviance = 1.3609
## tolerance is fixed at 0.0014
## now adding trees...
(#fig:brt_randomCV)
4.5 Performance
# Extract the performance of the models
perf_rand <- ggPerformance(brt_rand)
perf_rand
## Model 1
## Total.Deviance 1.3608766
## Residual.Deviance 0.3395770
## Correlation 0.9300522
## AUC 0.9956000
## Per.Expl 75.0471868
## cvDeviance 0.7269154
## cvCorrelation 0.7534021
## cvAUC 0.9202800
## cvPer.Expl 46.5847687
Note that BRT fitted with optimised parameters (i.e. learning rate and tree complexity) yields similar and slighlty better performance than any of the 6 methods tested with the SDM package using default settings. For instance, cross-validation AUC is 0.92 with the optimised BRT model as opposed to 0.91 for the best performing methods (i.e. RF and MARS using the SDM package). Thus, the main results in the paper only relies on optimised BRT models with learning rate set at 0.001, and tree complexity at 5.
5 Main Results from Boosted Regression Tree using Spatial blocks for cross-validation
5.1 Spatial autocorrelation (SAC) and autoccorelation range in predictors
# Keep selected predictors and remove "Substrate" (as it is a factor) from Raster Stack
load(file.path(input_dir,'IE_EnvRasterStack_2.75km.RData'))
raStack <- subset(raStack,predictors)
raStack_variog <- dropLayer(raStack, "Substrate")
# Number of sample points for fitting the variograms
n_samp <- 5000
sac_envir <- spatialAutoRange(rasterLayer = raStack_variog,
sampleNumber = n_samp,
doParallel = TRUE,
showPlots = TRUE,
plotVariograms = FALSE)
# Plot the autocorrelation range
sac_envir$plots$barchart
Figure 5.1:
This is the range over which observations are independent. It is determined by constructing an empirical variogram. The variogram models are fitted for each layers using 5000 of sample points. The plotted block size is based on the median of the spatial autocorrelation ranges. Color correspond to the first layer of the raster stack which is Salinity
# Function to plot the variogram modified from the automap package to be able to change the title
source('CustomFunctions/plot.autofitVariogram.R')
for( i in 1:length(sac_envir$variograms)){
plot.autofitVariogram(sac_envir$variograms[[i]],
main = names(raStack_variog)[i])
}
Figure 5.2:
5.2 Generate Spatial Blocks based on environmental autocorrelation range
# Create a SpatialPointsDataFrame object
sp_df <- st_as_sf(sab_env, coords = c("Longitude", "Latitude"), crs = crs(raStack_variog))
spBlock1 <- spatialBlock(speciesData = sp_df,
# To account for prevalence (evenly distributed records between train and test folds)
species = 'occurrence',
# Range for spatial blocks is based on spatial autocorrelation range in environmnental data
theRange = sac_envir$range,
# Generate 10 folds for cross validation
k = n_folds,
# Blocks assignment to folds
selection = "random",
biomod2Format = TRUE,
progress = TRUE,
verbose = TRUE,
# Argument for visualisation
showBlocks = TRUE,
rasterLayer = raStack_variog[["WaveHeight_mean"]])
## The best folds was in iteration 18:
## train_0 train_1 test_0 test_1
## 1 94 65 8 9
## 2 82 71 20 3
## 3 95 70 7 4
## 4 97 65 5 9
## 5 93 71 9 3
## 6 93 67 9 7
## 7 94 65 8 9
## 8 88 67 14 7
## 9 92 60 10 14
## 10 90 65 12 9
# Number of data per fold
table(spBlock1$foldID)
##
## 1 2 3 4 5 6 7 8 9 10
## 17 23 11 14 12 16 17 21 24 21
# Number of presence and absence by folds
data.frame(pa = sp_df$occurrence, folds = spBlock1$foldID) %>%
group_by(folds, pa) %>%
summarise(n=n()) %>%
spread(pa,n)
## Error in get(genname, envir = envir) : object 'vec_proxy' not found
## Error in get(genname, envir = envir) : object 'vec_ptype2' not found
## # A tibble: 10 x 3
## # Groups: folds [10]
## folds `0` `1`
## <int> <int> <int>
## 1 1 8 9
## 2 2 20 3
## 3 3 7 4
## 4 4 5 9
## 5 5 9 3
## 6 6 9 7
## 7 7 8 9
## 8 8 14 7
## 9 9 10 14
## 10 10 12 9
spBlock1$plots +
geom_sf(data = sp_df, aes(color = as.factor(occurrence)), alpha = 0.7) +
scale_color_manual(values = c("black", "deepskyblue"), name = "Occurence")
(#fig:env_blocking_plot1)
5.3 Fit BRT using spatial blocks for cross-validation
set.seed(64)
# Fit BRT with spatial folds for cross-validation defined from 'spBlocks'
brt_review_block1 <- gbm.step(data=sab_env, gbm.x = predictors,
gbm.y = which(names(sab_env) == "occurrence"),
family = "bernoulli",
tree.complexity = tc,
learning.rate = lr,
bag.fraction = bf,
fold.vector = spBlock1$foldID,
n.folds = n_folds,
keep.fold.models = TRUE,
keep.fold.vector = TRUE,
keep.fold.fit = TRUE,
verbose = FALSE)
##
##
## GBM STEP - version 2.9
##
## Performing cross-validation optimisation of a boosted regression tree model
## for occurrence and using a family of bernoulli
## Using 176 observations and 12 predictors
## loading user-supplied fold vector
## creating 10 initial models of 50 trees
##
## folds are stratified by prevalence
## total mean deviance = 1.3609
## tolerance is fixed at 0.0014
## now adding trees...
(#fig:fit_brt_block1)
5.4 Performance
# Extract the performance of the models
perf_review_block1 <- ggPerformance(brt_review_block1)
perf_review_block1
## Model 1
## Total.Deviance 1.3608766
## Residual.Deviance 0.4628235
## Correlation 0.8977138
## AUC 0.9883000
## Per.Expl 65.9907777
## cvDeviance 0.9140186
## cvCorrelation 0.6252818
## cvAUC 0.8441800
## cvPer.Expl 32.8360376
5.5 Initial and residual SAC
# Plot residuals of the brt model
resid_df <- data.frame(fitted = brt_review_block1$fitted, residuals = brt_review_block1$residuals) %>%
bind_cols(sab_env, .)
ggplot(data = resid_df, aes(x = as.factor(occurrence), y = fitted)) +
geom_violin(aes(fill = as.factor(occurrence)), alpha = 0.7) +
geom_boxplot(width = 0.05) +
scale_fill_manual(values = c("#404040","#ca0020"), guide = FALSE) +
xlab("Observation") + ylab("Fitted")
Figure 5.3:
# Initial SAC
sac_init <- correlog(x=resid_df$Longitude,
y=resid_df$Latitude,
z=resid_df$occurrence,
increment=5,
resamp=1000,
latlon=TRUE)
## 100 of 1000
200 of 1000
300 of 1000
400 of 1000
500 of 1000
600 of 1000
700 of 1000
800 of 1000
900 of 1000
1000 of 1000
plot(sac_init, ylab = "Moran's I", main = "Raw occurrences")
Figure 5.4:
# Residual sac
sac_resid <- correlog(x=resid_df$Longitude,
y=resid_df$Latitude,
z=resid_df$residuals,
increment=5,
resamp=1000,
latlon=TRUE)
## 100 of 1000
200 of 1000
300 of 1000
400 of 1000
500 of 1000
600 of 1000
700 of 1000
800 of 1000
900 of 1000
1000 of 1000
plot(sac_resid, ylab = "Moran's I", main = "Residuals")
Figure 5.5:
# Initial variogram
coordinates(resid_df) <- ~ Longitude + Latitude
proj4string(resid_df) = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
vario_init <- autofitVariogram(occurrence~1, resid_df)
plot.autofitVariogram(vario_init, main = "Raw occurrences")
Figure 5.6:
# Residual variogram
vario_resid <- autofitVariogram(residuals~1, resid_df)
plot.autofitVariogram(vario_resid, main = "Residuals")
Figure 5.7:
5.6 Importance of the variables and shape of the responses
# Influence of the variables
ggInfluence(brt_review_block1, col.bar = "black")
Figure 5.8:
## rel.inf
## WaveHeight_mean 26.8086304
## TideAmp 18.6629198
## StratifIndex 14.3900328
## Substrate 13.0702817
## SST_min 7.4195160
## WindSpeed 6.3518517
## FrontSide_mean 4.2239418
## SST_mean 2.6658032
## Salinity 2.0351637
## FrontDist_p05 1.7768810
## FrontPersist_mean 1.6058720
## FrontDist_mean 0.9891058
# Fitted values
ggPDfit(brt_review_block1,rug = T, smooth = TRUE, se = FALSE)
Figure 5.9:
# Fitted values
ggPD(brt_review_block1,rug = T)
Figure 5.10:
5.7 Fitted functions with bootstrap for confidence interval
brt_occ.prerun <- plot.gbm.4list(brt_review_block1)
brt_occ.boot <- gbm.bootstrap.functions(brt_review_block1, list.predictors=brt_occ.prerun, n.reps= 1000)
save(brt_review_block1, brt_occ.prerun, brt_occ.boot, file = file.path(output_dir,"brt_occ_SpBlock1_bootstrap.Rdata"))
load(file.path(output_dir,"brt_occ_SpBlock1_bootstrap.Rdata"))
source('CustomFunctions/ggPR_boot.plot.R')
# Draw partial dependency plots for the 4 most influential predictors
p1 <- ggPD_boot(brt_review_block1,
predictor = "WaveHeight_mean",
list.4.preds=brt_occ.prerun,
booted.preds=brt_occ.boot$function.preds,
n.plots = 1, rug = T,
cex.line=1, alpha.dot=0.4, type.ci = "ribbon", alpha.ci=0.4,
col.ci = "#bdbdbd", col.line = "#636363",
y.label = "", x.label = "Wave Height")
Figure 5.11:
p2 <- ggPD_boot(brt_review_block1,
predictor = "TideAmp",
list.4.preds=brt_occ.prerun,
booted.preds=brt_occ.boot$function.preds,
n.plots = 1, rug = T,
cex.line=1, alpha.dot=0.4, type.ci = "ribbon", alpha.ci=0.4,
col.ci = "#bdbdbd", col.line = "#636363",
y.label = "", x.label = "Tidal Amplitude")
Figure 5.12:
p3 <- ggPD_boot(brt_review_block1,
predictor = 'StratifIndex',
list.4.preds=brt_occ.prerun,
booted.preds=brt_occ.boot$function.preds,
n.plots = 1, rug = T,
cex.line=1, alpha.dot=0.4, type.ci = "ribbon", alpha.ci=0.4,
col.ci = "#bdbdbd", col.line = "#636363",
x.label = "Stratification Index", y.label = "")
Figure 5.13:
## INFLUENCE PLOT FOR SUBSTRATE
substrate_summary <- (apply( brt_occ.boot$function.preds[1:7,12,], 1,FUN = function(x) c('mean' = mean(x), 'se' = sqrt(var(x, na.rm = T)/length(x))) ))
substrate_summary <- data.frame(Substrate = factor(c("Sand","Coarse substrate","Mixed sediment","Rock & boulders","Mud","Sandy Mud","Muddy Sand"), levels = c("Sand","Coarse substrate","Mixed sediment","Rock & boulders","Mud","Sandy Mud","Muddy Sand")), t(substrate_summary))
names(substrate_summary) = c("Substrate",'Mean','SE')
p4 <- ggplot(substrate_summary,
aes(x = Substrate, y = Mean)) +
geom_pointrange(aes(ymin = Mean - SE, ymax = Mean + SE))+
scale_x_discrete(name = paste0('Substrate (',round(c(ggInfluence(brt_review_block1, col.bar = "black"))[[1]][4], digits= 1),'%)'), labels = substrate_summary$Substrate)+
scale_y_continuous(name="")+
theme(axis.text.x = element_text(color="black",
size=11, angle=45, vjust = 1, hjust = 1))
Figure 5.14:
p4
Figure 5.15:
p5 <- ggPD_boot(brt_review_block1,
predictor = 'SST_min',
list.4.preds=brt_occ.prerun,
booted.preds=brt_occ.boot$function.preds,
n.plots = 1, rug = T,
cex.line=1, alpha.dot=0.4, type.ci = "ribbon", alpha.ci=0.4,
col.ci = "#bdbdbd", col.line = "#636363",
x.label = "SST min", y.label = "")
Figure 5.16:
grid.arrange(p1,p2,p3,p4,ncol = 4)
Figure 5.17:
5.8 Interactions between variables
# Interaction sizes
ggInteract_list(brt_review_block1,index = T)
## var1.index var1.names var2.index var2.names int.size
## 1 7 WaveHeight_mean 6 TideAmp 88.63
## 2 7 WaveHeight_mean 4 StratifIndex 47.43
## 3 6 TideAmp 3 SST_min 35.07
## 4 12 Substrate 7 WaveHeight_mean 27.90
## 5 7 WaveHeight_mean 3 SST_min 14.62
## 6 7 WaveHeight_mean 5 WindSpeed 9.35
## 7 12 Substrate 3 SST_min 2.79
# Color palette
pal <- colorRampPalette(rev(brewer.pal(11, 'RdYlBu')))(100)
# Interaction WaveHeight_mean & TideAmp
ggInteract_3D(brt_review_block1, x = "WaveHeight_mean", y = "TideAmp", col.gradient = pal)
## maximum value = 0.65
Figure 5.18:
ggInteract_2D(gbm.object = brt_review_block1, x = "WaveHeight_mean", y = "TideAmp",show.dot = T,col.dot = "grey20",alpha.dot = 0.5,cex.dot = 0.2,label.contour = T,show.axis = T,legend = T, contour = T, smooth = "model", binwidth = 0.1, col.contour = "black") + scale_fill_distiller(palette = 'RdYlBu')
## maximum value = 0.7
Figure 5.19:
# Interaction WaveHeight_mean & StratifIndex
ggInteract_3D(brt_review_block1, x = "WaveHeight_mean", y = "StratifIndex", col.gradient = pal)
## maximum value = 0.67
Figure 5.20:
ggInteract_2D(gbm.object = brt_review_block1, x = "WaveHeight_mean", y = "StratifIndex",show.dot = T,col.dot = "grey20",alpha.dot = 0.5,cex.dot = 0.2,label.contour = T,show.axis = T,legend = T, contour = T, smooth = "model", binwidth = 0.1, col.contour = "black") + scale_fill_distiller(palette = 'RdYlBu')
## maximum value = 0.72
Figure 5.21:
# Interaction TideAmp & SST_min
ggInteract_3D(brt_review_block1, x = "TideAmp", y = "SST_min", col.gradient = pal)
## maximum value = 0.61
Figure 5.22:
ggInteract_2D(gbm.object = brt_review_block1, x = "TideAmp", y = "SST_min",show.dot = T,col.dot = "grey20",alpha.dot = 0.5,cex.dot = 0.2,label.contour = T,show.axis = T,legend = T, contour = T, smooth = "model", binwidth = 0.1, col.contour = "black") + scale_fill_distiller(palette = 'RdYlBu')
## maximum value = 0.63
Figure 5.23:
# Interaction Substrate & WaveHeight_mean
ggInteract_2D(gbm.object = brt_review_block1, x = "Substrate", y = "WaveHeight_mean") +
scale_color_brewer(palette = "Dark2",labels=c("Sand","Coarse substrate","Mixed sediment","Rock & boulders","Mud","Sandy Mud","Muddy Sand")) +
scale_linetype_discrete(labels=c("Sand","Coarse substrate","Mixed sediment","Rock & boulders","Mud","Sandy Mud","Muddy Sand"))
## maximum value = 0.87
Figure 5.24:
# Interaction WaveHeight_mean & SST_min
ggInteract_3D(brt_review_block1, x = "WaveHeight_mean", y = "SST_min", col.gradient = pal)
## maximum value = 0.65
Figure 5.25:
ggInteract_2D(gbm.object = brt_review_block1, x = "WaveHeight_mean", y = "SST_min",show.dot = T,col.dot = "grey20",alpha.dot = 0.5,cex.dot = 0.2,label.contour = T,show.axis = T,legend = T, contour = T, smooth = "model", binwidth = 0.1, col.contour = "black") + scale_fill_distiller(palette = 'RdYlBu')
## maximum value = 0.71
Figure 5.26:
# Interaction WaveHeight_mean & WindSpeed
ggInteract_3D(brt_review_block1, x = "WaveHeight_mean", y = "WindSpeed", col.gradient = pal)
## maximum value = 0.65
Figure 5.27:
ggInteract_2D(gbm.object = brt_review_block1, x = "WaveHeight_mean", y = "WindSpeed",show.dot = T,col.dot = "grey20",alpha.dot = 0.5,cex.dot = 0.2,label.contour = T,show.axis = T,legend = T, contour = T, smooth = "model", binwidth = 0.1, col.contour = "black") + scale_fill_distiller(palette = 'RdYlBu')
## maximum value = 0.7
Figure 5.28:
5.9 Spatial predictions
load(file.path(input_dir,'IE_EnvRasterStack_2.75km.RData'))
rat = (levels(raStack[['Substrate']]))[[1]]
rat$type <- c("2_Sand","3_Coarse_substrate", "4_Mixed_sediment", "5_Rock_&_boulders",
"11_Mud","12_Sandy_Mud", "13_Muddy_Sand")
rat$type <- c("2","3", "4", "5",
"11","12", "13")
levels(raStack[['Substrate']]) <- rat
raStackDF <- as.data.frame(raStack, xy = T)
raStackDF$Substrate <- as.factor(raStackDF$Substrate_type)
save(raStackDF, file = file.path(input_dir,"raStackDF.Rdata"))
# Prediction as probabilities
pred_brt <- predict(raStack, brt_review_block1, n.trees=brt_review_block1$gbm.call$best.trees, type="response")
par(mfrow = c(1,1))
#plot(pred_brt, main = 'BRT spatial predictions (Proba. of presence)')
# Prediction as probabilities from data.frame
raStackDF <- raStackDF[!(is.na(raStackDF$Substrate) | is.na(raStackDF$Aspect) ),]
sp_brt_pred = predict(brt_review_block1, newdata = raStackDF,n.trees=brt_review_block1$gbm.call$best.trees, type = 'response')
mat <- data.frame(pred= sp_brt_pred, Longitude= as.factor(raStackDF[,1]), Latitude= as.factor(raStackDF[,2]))
mat<- with(mat, tapply(pred, list(Longitude, Latitude), sum))
coastline <- rgdal::readOGR(file.path('coastlinesNOAA/','extract_coastline.shp'))
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/mpmarzlo/Documents/Collaborations/201906_SabellariaIreland(LouiseFIRTH)/SDM_SupMat_Nov2020/coastlinesNOAA/extract_coastline.shp", layer: "extract_coastline"
## with 51820 features
## It has 6 fields
par(mfrow = c(1,2))
cex_axis = 1.2
filled.contour(as.numeric(levels(as.factor(raStackDF[,1]))),
as.numeric(levels(as.factor(raStackDF[,2]))),
mat, color.palette= colorRampPalette(c("blue","khaki","red"), bias=1, space = "Lab"),
zlim= c(0,1),nlevels= 50,
xlab=NULL,ylab=NULL,
cex=1.5,
main = 'BRT predictions (proba.) and observed occurence (green points)',
plot.axes={
axis(1, cex.axis = cex_axis); axis(2, cex.axis= cex_axis);
cex.axis = cex_axis;
plot(coastline, add = T);
points(sab_env[which(sab_env$occurrence==1),]$Longitude,sab_env[which(sab_env$occurrence==1),]$Latitude, pch= 19,cex=1.2,col='darkgreen' )})
Figure 5.29:
# Prediction as presence/absence
threshold <- as.numeric(optimal.thresholds(
DATA = data.frame(1:nrow(sab_env),
sab_env$occurrence,
predict(brt_review_block1, newdata = sab_env, n.trees=brt_review_block1$gbm.call$best.trees, type="response")) ,
opt.methods = 2))[2]
# Discretise predictions into presences and absences
p <- pred_brt
p[p < threshold ] <- 0
p[p >= threshold ] <- 1
#plot(p, xlim = c(-10.7,-5), zlim = c(0,1), main = paste("BRT predictions, threshold set to", threshold))
print(paste("Threshold set to", threshold, "for BRT presence/absence predictions."))
## [1] "Threshold set to 0.47 for BRT presence/absence predictions."
# Discretise data.frame-based predictions into presences and absences
p <- sp_brt_pred
p[p < threshold ] <- 0
p[p >= threshold ] <- 1
mat <- data.frame(pred= p, Longitude= as.factor(raStackDF[,1]), Latitude= as.factor(raStackDF[,2]))
mat<- with(mat, tapply(pred, list(Longitude, Latitude), sum))
filled.contour(as.numeric(levels(as.factor(raStackDF[,1]))),
as.numeric(levels(as.factor(raStackDF[,2]))),
mat, color.palette= colorRampPalette(c("blue","khaki","red"), bias=1, space = "Lab"),
zlim= c(0,1),nlevels= 50,
xlab=NULL,ylab=NULL,
cex=1.5,
main = 'BRT predictions (Pres/Abs) and observed occurence (green points)',
plot.axes={
axis(1, cex.axis = cex_axis); axis(2, cex.axis= cex_axis);
cex.axis = cex_axis;
plot(coastline, add = T);
points(sab_env[which(sab_env$occurrence==1),]$Longitude,sab_env[which(sab_env$occurrence==1),]$Latitude, pch= 19,cex=1.2,col='darkgreen' )})
Figure 5.30:
6 Session info
sessionInfo()
## R version 3.5.3 (2019-03-11)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] splines grid stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] gstat_2.0-5 lattice_0.20-38
## [3] kernlab_0.9-27 rpart_4.1-15
## [5] RSNNS_0.4-12 Rcpp_1.0.3
## [7] rJava_0.9-11 earth_5.1.2
## [9] plotmo_3.5.6 TeachingDemos_2.10
## [11] plotrix_3.7-6 Formula_1.2-3
## [13] mgcv_1.8-28 nlme_3.1-140
## [15] mda_0.4-10 class_7.3-15
## [17] ggpointdensity_0.1.0 RColorBrewer_1.1-2
## [19] rgdal_1.4-3 fields_9.8-1
## [21] spam_2.2-2 dotCall64_1.0-0
## [23] mapdata_2.3.0 maps_3.3.0
## [25] forcats_0.4.0 stringr_1.4.0
## [27] readr_1.3.1 tidyr_0.8.3
## [29] tibble_2.1.1 tidyverse_1.2.1
## [31] scales_1.0.0 viridis_0.5.1
## [33] viridisLite_0.3.0 ggridges_0.5.1
## [35] ggcorrplot_0.1.3 sf_0.8-1
## [37] ncf_1.2-9 automap_1.0-14
## [39] blockCV_2.1.1 PresenceAbsence_1.1.9
## [41] ROCR_1.0-7 gplots_3.0.1.1
## [43] randomForest_4.6-14 gbm_2.1.5
## [45] ggBRT_0.0.0.9000 reshape2_1.4.3
## [47] plotly_4.9.0 gridExtra_2.3
## [49] ggthemes_4.2.0 ggplot2_3.3.0
## [51] plyr_1.8.4 dismo_1.1-4
## [53] directlabels_2018.05.22 sdm_1.0-67
## [55] usdm_1.1-18 raster_3.0-7
## [57] sp_1.3-2 readxl_1.3.1
## [59] magrittr_1.5 purrr_0.3.2
## [61] dplyr_0.8.1
##
## loaded via a namespace (and not attached):
## [1] backports_1.1.4 lazyeval_0.2.2 crosstalk_1.0.0
## [4] listenv_0.8.0 digest_0.6.18 htmltools_0.3.6
## [7] fansi_0.4.0 gdata_2.18.0 globals_0.12.5
## [10] modelr_0.1.4 xts_0.11-2 rmdformats_0.3.5
## [13] prettyunits_1.0.2 colorspace_1.4-1 rvest_0.3.4
## [16] haven_2.1.0 xfun_0.7 crayon_1.3.4
## [19] jsonlite_1.6 zeallot_0.1.0 survival_2.44-1.1
## [22] zoo_1.8-5 glue_1.3.1 gtable_0.3.0
## [25] questionr_0.7.0 future.apply_1.5.0 DBI_1.0.0
## [28] miniUI_0.1.1.1 isoband_0.2.0 xtable_1.8-4
## [31] progress_1.2.2 units_0.6-3 intervals_0.15.2
## [34] htmlwidgets_1.3 httr_1.4.0 FNN_1.1.3
## [37] pkgconfig_2.0.3 reshape_0.8.8 utf8_1.1.4
## [40] tidyselect_0.2.5 labeling_0.3 rlang_0.4.2
## [43] later_0.8.0 munsell_0.5.0 cellranger_1.1.0
## [46] tools_3.5.3 cli_1.1.0 generics_0.0.2
## [49] broom_0.5.2 evaluate_0.13 yaml_2.2.0
## [52] knitr_1.23 caTools_1.17.1.2 future_1.17.0
## [55] mime_0.6 xml2_1.2.0 compiler_3.5.3
## [58] rstudioapi_0.10 e1071_1.7-1 spacetime_1.2-3
## [61] stringi_1.4.3 highr_0.8 rgeos_0.5-1
## [64] Matrix_1.2-17 classInt_0.4-3 vctrs_0.1.0
## [67] pillar_1.4.0 cowplot_1.0.0 data.table_1.12.2
## [70] bitops_1.0-6 httpuv_1.5.1 R6_2.4.1
## [73] bookdown_0.11 promises_1.0.1 KernSmooth_2.23-15
## [76] codetools_0.2-16 gtools_3.8.1 assertthat_0.2.1
## [79] withr_2.1.2 parallel_3.5.3 hms_0.4.2
## [82] quadprog_1.5-8 rmarkdown_1.13 shiny_1.3.2
## [85] lubridate_1.7.4